Goto

Collaborating Authors

 Belief Revision


Fast Convergence of Belief Propagation to Global Optima: Beyond Correlation Decay

Neural Information Processing Systems

Belief propagation is a fundamental message-passing algorithm for probabilistic reasoning and inference in graphical models. While it is known to be exact on trees, in most applications belief propagation is run on graphs with cycles. Understanding the behavior of "loopy" belief propagation has been a major challenge for researchers in machine learning and other fields, and positive convergence results for BP are known under strong assumptions which imply the underlying graphical model exhibits decay of correlations. We show, building on previous work of Dembo and Montanari, that under a natural initialization BP converges quickly to the global optimum of the Bethe free energy for Ising models on arbitrary graphs, as long as the Ising model is ferromagnetic (i.e.



Transformers Represent Belief State Geometry in their Residual Stream Adam S. Shai Sarah E. Marzen Lucas Teixeira Simplex Department of Natural Sciences PIBBSS

Neural Information Processing Systems

What computational structure are we building into large language models when we train them on next-token prediction? Here, we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the datagenerating process. Leveraging the theory of optimal prediction, we anticipate and then find that belief states are linearly represented in the residual stream of transformers, even in cases where the predicted belief state geometry has highly nontrivial fractal structure. We investigate cases where the belief state geometry is represented in the final residual stream or distributed across the residual streams of multiple layers, providing a framework to explain these observations. Furthermore we demonstrate that the inferred belief states contain information about the entire future, beyond the local next-token prediction that the transformers are explicitly trained on. Our work provides a general framework connecting the structure of training data to the geometric structure of activations inside transformers.




Transformers Represent Belief State Geometry in their Residual Stream

Neural Information Processing Systems

What computational structure are we building into large language models when we train them on next-token prediction? Here, we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data- generating process. Leveraging the theory of optimal prediction, we anticipate and then find that belief states are linearly represented in the residual stream of transformers, even in cases where the predicted belief state geometry has highly nontrivial fractal structure. We investigate cases where the belief state geometry is represented in the final residual stream or distributed across the residual streams of multiple layers, providing a framework to explain these observations. Furthermore we demonstrate that the inferred belief states contain information about the entire future, beyond the local next-token prediction that the transformers are explicitly trained on.


A Logic of Uncertain Interpretation

arXiv.org Artificial Intelligence

We do not always know how to interpret the statements that we hear, the observations that we make, or the evidence that we gather. Traditional frameworks for reasoning about uncertainty and belief revision typically suppose that new information is presented definitively: there is no question about what was learned. The paradigm of Bayesian conditioning exemplifies this assumption: "evidence" takes the simple form of an event E, and belief revision proceeds by updating probabilities accordingly: π π( | E). In order to capture the kind of uncertainty about interpretation we wish to reason about, we change the fundamental representation of events so that the sets they correspond to are themselves variable--the "true meaning" of a statement thus becomes itself an object of uncertainty. This approach follows in the spirit of other recent work [1, 2], expanding on it along two key dimensions.


A modal logic translation of the AGM axioms for belief revision

arXiv.org Artificial Intelligence

Building on the analysis of Bonanno (Artificial Intelligence, 2025) we introduce a simple modal logic containing three modal operators: a unimodal belief operator, a bimodal conditional operator and the unimodal global operator. For each AGM axiom for belief revision, we provide a corresponding modal axiom. The correspondence is as follows: each AGM axiom is characterized by a property of the Kripke-Lewis frames considered in Bonanno (Artificial Intelligence, 2025) and, in turn, that property characterizes the proposed modal axiom.


Constraints Based Convex Belief Propagation

Neural Information Processing Systems

Inference in Markov random fields subject to consistency structure is a fundamental problem that arises in many real-life applications. In order to enforce consistency, classical approaches utilize consistency potentials or encode constraints over feasible instances. Unfortunately this comes at the price of a serious computational bottleneck. In this paper we suggest to tackle consistency by incorporating constraints on beliefs. This permits derivation of a closed-form message-passing algorithm which we refer to as the Constraints Based Convex Belief Propagation (CBCBP).


Achieving the KS threshold in the general stochastic block model with linearized acyclic belief propagation

Neural Information Processing Systems

The stochastic block model (SBM) has long been studied in machine learning and network science as a canonical model for clustering and community detection. In the recent years, new developments have demonstrated the presence of threshold phenomena for this model, which have set new challenges for algorithms. This was proved for two communities, but remained open from three communities. We prove this conjecture here, obtaining a more general result that applies to arbitrary SBMs with linear size communities. The developed algorithm is a linearized acyclic belief propagation (ABP) algorithm, which mitigates the effects of cycles while provably achieving the KS threshold in O(n \ln n) time.